Add example for PartitionedFile schema#22809
Conversation
eafcd76 to
7ae54d6
Compare
| //! (file: query_http_csv.rs, desc: Query CSV files via HTTP) | ||
| //! | ||
| //! - `remote_catalog` | ||
| //! (file: remote_catalog.rs, desc: Interact with a remote catalog) |
There was a problem hiding this comment.
Please add an entry for partitioned_file_schema
|
|
||
| let table_schema = Arc::new(Schema::new(vec![ | ||
| Field::new("a", DataType::Int32, true), | ||
| Field::new("b", DataType::Float64, true), |
There was a problem hiding this comment.
Please add a comment what is the purpose of field b. It is not mentioned at https://github.com/apache/datafusion/pull/22809/changes#diff-5097924e81226127006feb2aab9ff70726bf3ad7d6bb5d6d73a7a53f0412636bR45
There was a problem hiding this comment.
I added a comment for this field, let me know if it makes sense.
alamb
left a comment
There was a problem hiding this comment.
Thank you @fpetkovski and @martin-g
I ran it locally like
andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ cargo run --profile=ci --example data_io -- partitioned_file_schema
Finished `ci` profile [unoptimized] target(s) in 0.25s
Running `target/ci/examples/data_io partitioned_file_schema`
RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Int32, nullable: true }, Field { name: "b", data_type: Float64, nullable: true }], metadata: {} }, columns: [PrimitiveArray<Int32>
[
1,
2,
3,
4,
5,
], PrimitiveArray<Float64>
[
null,
null,
null,
null,
null,
]], row_count: 5 }
RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Int32, nullable: true }, Field { name: "b", data_type: Float64, nullable: true }], metadata: {} }, columns: [PrimitiveArray<Int32>
[
1,
2,
3,
4,
5,
], PrimitiveArray<Float64>
[
null,
null,
null,
null,
null,
]], row_count: 5 }
Got schema error: ParquetError(ArrowError("Incompatible supplied Arrow schema: data type mismatch for field a: requested Int64 but found Int32"))I took the liberty of pushing a commit to your branch to resolve a CI error: https://github.com/apache/datafusion/actions/runs/27138078890/job/80100749669?pr=22809
| /// already known, it can be supplied up front so this inference step is | ||
| /// skipped, saving an I/O round trip and metadata parse per file. | ||
| /// | ||
| /// The example writes a small Parquet file with a single `Int32` column `a` and |
There was a problem hiding this comment.
Thank you -- this is a nice description of what is going on
## Which issue does this PR close? Addresses the suggestion in apache#22360 (review) to add an example for specifying an Arrow schema for a `PartitionedFile`. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> * Add an example in `datafusion-examples/examples/data_io/partitioned_file_schema.rs`. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Tested with ```bash cd datafusion-examples/examples cargo run --example data_io -- partitioned_file_schema ``` ## Are there any user-facing changes? No user facing changes. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> cc @alamb --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
Addresses the suggestion in #22360 (review) to add an example for specifying an Arrow schema for a
PartitionedFile.What changes are included in this PR?
datafusion-examples/examples/data_io/partitioned_file_schema.rs.Are these changes tested?
Tested with
cd datafusion-examples/examples cargo run --example data_io -- partitioned_file_schemaAre there any user-facing changes?
No user facing changes.
cc @alamb